A lazy bagging approach to classification
نویسندگان
چکیده
In this paper, we propose lazy bagging (LB), which builds bootstrap replicate bags based on the characteristics of test instances. Upon receiving a test instance xk, LB trims bootstrap bags by taking into consideration xk’s nearest neighbors in the training data. Our hypothesis is that an unlabeled instance’s nearest neighbors provide valuable information to enhance local learning and generate a classifier with refined decision boundaries emphasizing the test instance’s surrounding region. In particular, by taking full advantage of xk’s nearest neighbors, classifiers are able to reduce classification bias and variance when classifying xk. As a result, LB, which is built on these classifiers, can significantly reduce classification error, compared with the traditional bagging (TB) approach. To investigate LB’s performance, we first use carefully designed synthetic datasets to gain insight into why LB works and under which conditions it can outperform TB. We then test LB against four rival algorithms on a large suite of 35 real-world benchmark datasets using a variety of statistical tests. Empirical results confirm that LB can statistically significantly outperform alternative methods in terms of reducing classification error.
منابع مشابه
ارتقای کیفیت دستهبندی متون با استفاده از کمیته دستهبند دو سطحی
Nowadays, the automated text classification has witnessed special importance due to the increasing availability of documents in digital form and ensuing need to organize them. Although this problem is in the Information Retrieval (IR) field, the dominant approach is based on machine learning techniques. Approaches based on classifier committees have shown a better performance than the others. I...
متن کاملA local measurement-based protection scheme for DER integrated DC microgrid using Bagging Tree
In recent years, DC microgrid has attracted considerable attention of the research community because of the wide usage of DC power-based appliances. However, the acceptance of DC microgrid by power utilities is still limited due to the issues associated with the development of a reliable protection scheme. The high magnitude of DC fault current, its rapid rate of rising and absence of zero cros...
متن کاملImproving reservoir rock classification in heterogeneous carbonates using boosting and bagging strategies: A case study of early Triassic carbonates of coastal Fars, south Iran
An accurate reservoir characterization is a crucial task for the development of quantitative geological models and reservoir simulation. In the present research work, a novel view is presented on the reservoir characterization using the advantages of thin section image analysis and intelligent classification algorithms. The proposed methodology comprises three main steps. First, four classes of...
متن کاملAn Effective Approach for Imbalanced Classification: Unevenly Balanced Bagging
Learning from imbalanced data is an important problem in data mining research. Much research has addressed the problem of imbalanced data by using sampling methods to generate an equally balanced training set to improve the performance of the prediction models, but it is unclear what ratio of class distribution is best for training a prediction model. Bagging is one of the most popular and effe...
متن کاملLazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees
Lbr is a lazy semi-naive Bayesian classiier learning technique, designed to alleviate the attribute interdependence problem of naive Bayesian classiication. To classify a test example , it creates a conjunctive rule that selects a most appropriate subset of training examples and induces a local naive Bayesian classiier using this subset. Lbr can signii-cantly improve the performance of the naiv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 41 شماره
صفحات -
تاریخ انتشار 2008